328 PART 6 Analyzing Survival Data
Knowing When to Use Survival Regression
In Chapter 21, we examine the special problems that come up when the researcher
can’t continue to collect data during follow-up on a participant long enough to
observe whether or not they ever experience the event being studied. To recap, in
this situation, you should censor the data. This means you should acknowledge the
participant was only observed for a limited amount of time, and then was lost to
follow-up. In that chapter, we also explain how to summarize survival data using
life tables and the Kaplan-Meier method, and how to graph time-to-event data as
survival curves. In Chapter 22, we describe the log-rank test, which you can use to
compare survival among a small number of groups — for example, participants
taking drug versus placebo, or participants initially diagnosed at four different
stages of the same cancer.
But the log-rank test has limitations:»
» The log-rank test doesn’t handle numerical predictors well. Because this
test compares survival among a small number of categories, it does not work
well for a numerical variable like age. To compare survival among different
age groups with the log-rank test, you would first have to categorize the
participants into age ranges. The age ranges you choose for your groups
should be based on your research question. Because doing this loses the
granularity of the data, this test may be less efficient at detecting gradual
trends across the whole age range.»
» The log-rank test doesn’t let you analyze the simultaneous effect of
different predictors. If you try to create subgroups of participants for each
distinct combination of categories for more than one predictor (such as three
treatment groups and three diagnostic groups), you will quickly see that you
have too many groups and not enough participants in each group to support
the test. In this example — with three different treatment groups and three
diagnostic groups — you would have 3 × 3 groups, which is nine, and is already
too many for a log-rank test to be useful. Even if you have 100 participants in
your study, dividing them into nine categories greatly reduces the number of
participants in each category, making the subgroup estimate unstable.
Use survival regression when the outcome (the Y variable) is a time-to-event
variable, like survival time. Survival regression lets you do all of the following,
either in separate models or simultaneously:»
» Determine whether there is a statistically significant association between
survival and one or more other predictor variables